# English Visual Interaction
Qwen2 VL 2B Instruct GGUF
Apache-2.0
Qwen2-VL-2B-Instruct is a multimodal vision-language model that supports image-text generation tasks, based on the Qwen2 architecture with a parameter scale of 2B.
Image-to-Text English
Q
second-state
125
3
Florence 2 VLM Doc VQA
A specialized version for Visual Question Answering (VQA) fine-tuned based on microsoft/Florence-2-base-ft, capable of interpreting image content and answering related questions
Text-to-Image
Transformers English

F
prithivMLmods
69
4
Featured Recommended AI Models